Data Mining with League of Legends

Project in few points

What's League Of Legend ?

more about the different roles : check this
more about the champions : check this

Goals

The main goal of the project is to compute statistics on the success of a team if they choose heroes to pick or ban. The major challenge is to compute best combination of heroes.
Here to learn more about the rules of Legend of League :

Motivations

Each year Riot Games hosts a professional League of Legends world championship tournament the League of Legends World Championship and the League of Legends Series for Europe and North America. Thanks to ese events Riot Games earned $1 billion in microtransaction revenues in 2014. Furthermore, in the same year Riot Games reports that 27 million people play the League of Legends daily. This shows the importance of analysing and mining the data of such a game.

The team

Our team is made up of four final year IT engineering students who are passionate about big data issues. Indeed they are currently students in the Department of Engineering and Applications of Big Data.

The data

The data (*) we analyzed was organized by two researchers at the Université de Lyon, we worked on 10% of this data that is to say 60 GB. It contains for each match numerous attributes we have selected the ones we evaluated significant to do prediction. These attributes are : the match id, the banned hero id, the the picked hero id, the team id, and the winner id.
(*)The collected data can be found on the below link.

The process

First of all we parsed to keep only the attributes we need. Then using Lucene we indexed this data so that we have easy access to each element. We printed the relevant data on a text file. Basically, we get 3MB of data from 60GB initialy. As the order of selecting heroes is important we chose to do sequence mining, this is why we used SPMF(*).
The sequence mining implies to have an unique encoding for each item. So built a translator program and a dictionnary to be sure we can do the reverse translation. The main idea is that a champion can be either ban/pick and by both team. So, for each champ we got 4 potential positions. And one more constraint is that the encoding has to be consecutive.
Afterwards we can compute our sequences and get only the most recurrent one. We got around 40000 recurrent sequences. And from this we are able to check if the current selection made by the user is one of them or at least is a part of one of them. Because our algorithm makes sure that we check all of the possible ordered combinaisons.
Eventually, each combinaison is associated to balance value and a support value. And that's precisely what we are interested in. Another approach we studied but didn’t put in practice is the deep learning using the recursive neural networks.
(*) : SPMF

Results

So the final output is the balance value and the support. Each balance value is given by this program. This is mostly a measure meant for evaluate if a team composition is likely to win or loose. And obviously the higher this value is, the better your team composition (including bans) is likely to win
The second measure is the support value. Basically the higher the support is the better the balance value is relevant. So this is really important that this be high. If the support value is around 5 or 10 this is already great for our data. That means that in your current selection, you have around 5-10 recurrent sequences which is obviously nice.

Perpectives

We would have liked to improve the balance value. Because it's not a explicite value, not easily understandable. We would have liked implemente totally the RNN process. And be able to do some prevision instead of just a retro-analysis.

Demo Presentation

Single Project.